A novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks.
Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, it is observed that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows.
This second iteration of SigLIP 2 introduces SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP, and extends the original image-text training objective with several prior, independently developed techniques into a unified recipe.
This work introduces GR00T N1, an open foundation model for humanoid robots that outperforms the state-of-the-art imitation learning baselines on standard simulation benchmarks across multiple robot embodiments and deploys the model on the Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation tasks.
With an improved framework for model development and evaluation, a large language model is shown to provide answers to medical questions that are comparable or preferred with respect to those provided by human physicians.
A guideline to transparently reporting the use of AI in any manuscript in general is presented and will evolve over time as technology, systems and behaviour evolve.
This work improves existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales and presents a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens.
The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy.
Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters, delivers the best performance for their size, and even offers competitive alternatives to models that are 2-3 times bigger.
OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations, is introduced and it is shown that it can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities.
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models, and presents comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development.
Results show that TorchDynamo is able to capture graphs more robustly than prior approaches while adding minimal overhead, and TorchInductor is able to provide a 2.41× training geometric mean speedup on an NVIDIA A100 GPU across 180+ real-world models, which outperforms six other compilers.
Recent improvements to Job Dispatcher are overviews, including its brand new website and documentation, enhanced visualisations, improved job management, and a rising trend of user reliance on the service from low- and middle-income regions.
The Cosmos World Foundation Model Platform is presented to help developers build customized world models for their Physical AI setups and position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
Two below-threshold surface code memories on Willow, a distance-7 code and a distance-5 code integrated with a real-time decoder, indicate device performance that, if scaled, could realize the operational requirements of large-scale fault-tolerant quantum algorithms.
The development of TRIPOD+AI is described and the expanded 27 item checklist with more detailed explanation of each reporting recommendation is presented, and the TRIPOD+AI for Abstracts checklist is presented.
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2, a large model that significantly outperforms other models of comparable size and makes the model weights available under an OpenRAIL license.
OLMo is built, a competitive, truly Open Language Model, to enable the scientific study of language models and it is hoped this release will empower the open research community and inspire a new wave of innovation.
This work introduces Tulu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques.
PaliGemma is an open Vision-Language Model that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model that achieves strong performance on a wide variety of open-world tasks.
The model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views.
The development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland, is described.
The results show the significant potential of AI in personalizing learning, automating routine tasks, and providing access to knowledge, but also reveal serious risks of exacerbating social inequality and ethical dilemmas.
Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface, and general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support.
The status of InterPro is reported on, detailing new developments in the database, associated web interface and software, including the increased integration of structures predicted by AlphaFold and the enhanced description of protein families using artificial intelligence.
AlphaEvolve is an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs on highly challenging tasks such as tackling open scientific problems or optimizing critical pieces of computational infrastructure.
Despite its compact 3.8-billion-parameter size, this experimental version of Phi-4-Mini achieves reasoning performance on par with or surpassing significantly larger models, including DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B.
This work introduces DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance, and introduces DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios.
This work builds the first Multi-Agent System Failure Taxonomy (MAST), a comprehensive dataset of 1600+ annotated traces collected across 7 popular MAS frameworks, and develops an LLM-as-a-Judge pipeline with high agreement with human annotations to enable scalable annotation.
A new measure of firm-level AI investments is proposed, using a unique combination of worker resume and job postings datasets, which reveals a stark increase in AI investments across sectors.
GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas, document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society.
Aurora, a large-scale foundation model trained on more than one million hours of diverse geophysical data, outperforms operational forecasts in predicting air quality, ocean waves, tropical cyclone tracks and high-resolution weather, all at orders of magnitude lower computational cost.
A novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time is studied.
Extensive evaluation shows that Kimi-Audio achieves state-of-the-art performance on a range of audio benchmarks including speech recognition, audio understanding, audio question answering, and speech conversation.
An extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%, which underscores the need for further advancements in this area.
An AI co-scientist is introduced, a multi-agent system built on Gemini 2.0 intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned to scientist-provided research objectives and guidance.
Transparent reporting of a multivariable model for individual prognosis or diagnosis–large language model TRIPOD-LLM is a checklist of items considered essential for good reporting of studies that are developing or evaluating an LLM for use in healthcare settings, a ‘living guideline’ that emphasizes transparency, human oversight and task-specific performance reporting.
It is shown that language models trained at scale on evolutionary data can generate functional proteins that are far away from known proteins, and ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins is presented.
This study reviews the techniques and tools used for automatic disease identification, state-of-the-art DL models, and recent trends in DL-based image analysis, and evaluates various DL architectures, providing guidance on the suitability of these models for production environments.
Article Galaxy Pages is a free service from Research Solutions, a company that offers access to content in collaboration with publishing partners, online repositories and discovery services.